Conversion of categorical variables into numerical variables via Bayesian network classifiers for binary classifications

نویسندگان

  • Namgil Lee
  • Jong-Min Kim
چکیده

Many pattern classification algorithms such as Support Vector Machines (SVMs), MultiLayer Perceptrons (MLPs), and K-Nearest Neighbors (KNNs) require data to consist of purely numerical variables. However many real world data consist of both categorical and numerical variables. In this paper we suggest an effective method of converting the mixed data of categorical and numerical variables into data of purely numerical variables for binary classifications. Since the suggested method is based on the theory of learning Bayesian Network Classifiers (BNCs), it is computationally efficient and robust to noises and data losses. Also the suggestedmethod is expected to extract sufficient information for estimating a minimum-error-rate (MER) classifier. Simulations on artificial data sets and real world data sets are conducted to demonstrate the competitiveness of the suggested methodwhen the number of values in each categorical variable is large andBNCs accurately model the data. © 2009 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Bayesian networks with rule extraction to infer the risk of weed infestation in a corn-crop

This paper describes the modeling of a weed infestation risk inference system that implements a collaborative inference scheme based on rules extracted from two Bayesian network classifiers. The first Bayesian classifier infers a categorical variable value for the weed–crop competitiveness using as input categorical variables for the total density of weeds and corresponding proportions of narro...

متن کامل

A New Nonlinear Specification of Structural Breaks for Money Demand in Iran

In a structural time series regression model, binary variables have been used to quantify qualitative or categorical quantitative events such as politic and economic structural breaks, regions, age groups and etc. The use of the binary dummy variables is not reasonable because the effect of an event decreases (increases) gradually over time not at once. The simple and basic idea in this paper i...

متن کامل

Bayesian Chain Classifiers for Multidimensional Classification

In multidimensional classification the goal is to assign an instance to a set of different classes. This task is normally addressed either by defining a compound class variable with all the possible combinations of classes (label power-set methods, LPMs) or by building independent classifiers for each class (binary-relevance methods, BRMs). However, LPMs do not scale well and BRMs ignore the de...

متن کامل

Risk Analysis of Operating Room Using the Fuzzy Bayesian Network Model

To enhance Patient’s safety, we need effective methods for risk management. This work aims to propose an integrated approach to risk management for a hospital system. To improve patient’s safety, we should develop flexible methods where different aspects of risk and type of information are taken into consideration. This paper proposes a fuzzy Bayesian network to model and analyze risk in the op...

متن کامل

Learning Bayesian Network Structure using Markov Blanket in K2 Algorithm

‎A Bayesian network is a graphical model that represents a set of random variables and their causal relationship via a Directed Acyclic Graph (DAG)‎. ‎There are basically two methods used for learning Bayesian network‎: ‎parameter-learning and structure-learning‎. ‎One of the most effective structure-learning methods is K2 algorithm‎. ‎Because the performance of the K2 algorithm depends on node...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Statistics & Data Analysis

دوره 54  شماره 

صفحات  -

تاریخ انتشار 2010